Approximate Homogeneous Graph Summarization

نویسندگان

Zheng Liu

Jeffrey Xu Yu

Hong Cheng

چکیده

Graph patterns are able to represent the complex structural relations among objects in many applications in various domains. The objective of graph summarization is to obtain a concise representation of a single large graph, which is interpretable and suitable for analysis. A good summary can reveal the hidden relationships between nodes in a graph. The key issue is how to construct a high-quality and representative super-graph, GS , in which a super-node summarizes a collection of nodes based on the similarity of attribute values and neighborhood relationships associated with nodes in G, and a super-edge summarizes the edges between nodes in G that are represented by two different super-nodes in GS . We propose an entropy-based unified model for measuring the homogeneity of the super-graph. The best summary in terms of homogeneity could be too large to explore. By using the unified model, we relax three summarization criteria to obtain an approximate homogeneous summary of reasonable size. We propose both agglomerative and divisive algorithms for approximate summarization, as well as pruning techniques and heuristics for both algorithms to save computation cost. Experimental results confirm that our approaches can efficiently generate high-quality summaries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Hybrid Summarization

One solution to process and analysis of massive graphs is summarization. Generating a high quality summary is the main challenge of graph summarization. In the aims of generating a summary with a better quality for a given attributed graph, both structural and attribute similarities must be considered. There are two measures named density and entropy to evaluate the quality of structural and at...

متن کامل

Multi-layered graph-based multi-document summarization model

Multi-document summarization is a process of automatic generation of a compressed version of the given collection of documents. Recently, the graph-based models and ranking algorithms have been actively investigated by the extractive document summarization community. While most work to date focuses on homogeneous connecteness of sentences and heterogeneous connecteness of documents and sentence...

متن کامل

Weighted Theta Functions and Embeddings with Applications to Max-Cut, Clustering and Summarization

We introduce a unifying generalization of the Lovász theta function, and the associated geometric embedding, for graphs with weights on both nodes and edges. We show how it can be computed exactly by semidefinite programming, and how to approximate it using SVM computations. We show how the theta function can be interpreted as a measure of diversity in graphs and use this idea, and the graph em...

متن کامل

TLR at DUC 2006: approximate tree similarity and a new evaluation regime

We propose modifications to a summarization system that is based on computing the tree edit distance between dependency parse trees of reformulated questions and candidate sentences. We modify a recently introduced approximate tree edit distance metric by using mutual information between stemmed words for similarity matching of sub-trees. We also propose an approximate way of deriving anaphoric...

متن کامل

Tuple Graph Synopses for Relational Data Sets∗

This paper introduces the Tuple Graph (TuG) synopses, a new class of data summaries that enable accurate selectivity estimates for complex relational queries. The proposed summarization framework adopts a “semi-structured” view of the relational database, modeling a relational data set as a graph of tuples and join queries as graph traversals respectively. The key idea is to approximate the str...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 20 شماره

صفحات -

تاریخ انتشار 2012

Approximate Homogeneous Graph Summarization

نویسندگان

چکیده

منابع مشابه

Graph Hybrid Summarization

Multi-layered graph-based multi-document summarization model

Weighted Theta Functions and Embeddings with Applications to Max-Cut, Clustering and Summarization

TLR at DUC 2006: approximate tree similarity and a new evaluation regime

Tuple Graph Synopses for Relational Data Sets∗

عنوان ژورنال:

اشتراک گذاری